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ABSTRACT 

This paper presents our Linked Open Data (LOD) infras- 
tructures for genomic and experimental data related to mi- 
croRNA biomolecules. Legacy data from two well-known mi- 
croRNA databases with experimental data and observations, 
as well as change and version information about microRNA 
entities, are fused and exported as LOD. Our LOD server as- 
sists biologists to explore biological entities and their evolu- 
tion, and provides a SPARQL endpoint for applications and 
services to query historical miRNA data and track changes, 
their causes and effects. 

1. INTRODUCTION 

The technology advances in scientific hardware (sensors, 
new-generation sequencers, etc.), together with the explo- 
sion of Web 2.0 technologies, have completely changed the 
way scientists create, disseminate and consume large vol- 
umes of information and new content. More and more sci- 
entific datasets break the walls of aAIJprivateaAl manage- 
ment within their production site, are published, and be- 
come available for potential data consumers, i.e., individual 
users, scientific communities, applications/services. Typical 
examples include experimental or observational data and sci- 
entific models from the life science domain, climate, earth, 
astronomy, etc. 

Linked Datc[^] is a compelling approach for the dissemi- 
nation and re-use of scientific data, realizing the vision of 
the so-called Linked Scienc^] The Linked Data paradigm 
involves practices to publish, share, and connect data on the 
Web, and offers a new way of data integration and interoper- 
ability. Briefly, Linked Data is about using the Web to create 
links between data from different sources. The driving force 

1 http://linkeddata.org/ 
2 http://linkedscience.org/ 
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to implement Linked Data spaces is the RDF technology. 
The basic principles of the Linked Data paradigm is (a) use 
the RDF data model to publish structured data on the Web, 
and (b) use RDF links to interlink data from different data 
sources. The aim of the Linked Data technologies is to give 
rise to the Web of Data. 

The Web of Data is impelled by the current trend towards 
an open Web. The open data movement is a significant and 
emerging force towards this direction. Open science data 
is open data related to observations and results of scientific 
activities, which are publicly available for anyone to analyze 
and reuse. 

However, by just converting legacy scientific data as Linked 
Open Data (LOD), we do not fully meet the requirements 
of data re-use. To ensure re-use and allow exploitation and 
validation of scientific results, several challenges related to 
scientific data dynamics should be tackled. Scientific data 
are evolving and diverse data. Users and services (a) should 
have access not only to up-to-date scientific LOD bases but 
to any of the previous versions of those bases, and (b) should 
be able to track the changes among versions, as well as their 
cause and effects. 

In this work, we present our LOD services for life science 
data, and more specifically, genomic and experimental data 
related to microRNA biomolecules (see Section 2). Legacy 
data from two well-known microRNA databases are fused 
and exported as LOD. The first database (see Section 3) pro- 
vides experimental data and observations, while the second 
one (see Section 4) provides change and version information 
about microRNA entities. Our LOD services provide the 
following facilities: 

• Biologists can explore biological entities, their char- 
acteristics, and related experimental data with up-to- 
date information. 

• Services and applications can retrieve the same (up- 
to-date) information as above by using our server as 
SPARQL endpoint. 

• Biologists can retrieve out-of-date resource descriptions, 
navigate between previous and next versions of the 
resources, see the changes involved, their causes and 
their effects on the resources. 

• Services and applications can retrieve historic informa- 
tion as above by using our server as SPARQL endpoint. 

The system has been built using the D2R LOD infrustruc- 



ture^] All services are available at http : //diwis . imis . athena- 
innovat ion . gr /mlod. 

Related Work. Our approach is specially-tailored to the 
scientific domain of life science data, and more specifically 
to genomic and experimental data related to microRNA 
biomolecules. Several attempts have been recently made 
to provide scientific LOD services. W3C has established 
the Semantic Web Health Care and Life Sciences Interest 
Group (HCLSQ 

aiming to exploit Semantic Web technolo- 
gies for the management and the representation of biological, 
medicine and health care data. The HCLS group works on 
Linking Open Drug Data (LODD) project which provides 
linked RDF data exported from several data sources like 
ClinicialTriasl.gov, DrugBank, DailyMed, etc. Additionally, 
Bio2RDF[^] provides linked RDF data produced from over 30 
biological data sources. Some earlier efforts include Yeast- 
Hub M, LinkHub BioDash [7] and BioGatewajQ Fi- 
nally, Chem2Bio2RDF [2] integrates chemical and biologi- 
cal information. Also, several chemogenomics repositories 
have been transformed into RDF and linked to Bio2RDF 
and LODD RDF resources. 

In the context of LOD, numerous approaches have been 
proposed to study the problems of evolution, versioning, and 
change detection. Particularly, in 10 , the term dataset dy- 



namics is coined, essentially addressing content and inter- 
linking changes in linked data sources. In [II] , a comparative 
study on the approaches and tools for detecting, propagat- 
ing and describing changes in LOD resources and datasets is 
provided. In [8] , the authors deal with changes in the linkage 
between datasets and specifically with the problem of broken 
links. A similar approach is the Silk linking framework [l2] , 
which is used for discovering and maintaining data links be- 
tween web data sources. Regarding versioning and temporal 
approaches to LOD, in [H] the Memento framework is intro- 
duced as a resource versioning mechanism for LOD. Finally, 
in [4] they propose linked timelines, a temporal representa- 
tion and management for LOD. 

2. BACKGROUND 

Biologists used to consider proteins and DNA as movers 
and shakers in genomics, seeing RNA as nothing more than 
a messenger to carry information between the two. This 
has dramatically changed after the discovery, in early 2000s, 
of the key role played in gene expression by small RNA 
molecules, called microRNAs (miRNAs). miRNAs can com- 
pletely silence proteins. They do so by binding themselves to 
complementary sequences on message RNA (mRNA) tran- 
scripts, called targets. The knowledge of miRNA targets 
(i.e., which mRNA transcripts are targeted by a miRNA) 
is important for therapeutic uses. For example, based on 
such knowledge, biologists can shut off genes by delivering 
artificial miRNA molecules into cells. 

The first miRNA molecules were identified in 1993. Since 
then, there has been a dramatic increase in the number of 
miRNAs discovered and registered in rrw7?Bas^] a search- 
able database of published miRNA sequences and annota- 
tion. However, there is a lack of high-throughput experi- 

3 http://www4. wiwiss.fu-berlin.de/bizer/d2r-server/ 
4 http: / /www. w3.org/blog/hcls/ 
5 http://bio2rdf.org/ 

6 http: / /www. semantic-systems-biology.org/biogateway 
7 http://www. mirbase.org/ 



mental methods for identifying miRNA targets. Thus, com- 
putational methods to predict targets have become increas- 
ingly important, and led to the experimental identification 
of many miRNA targets. 

Our team in IMIS/" Athena" R.C. and the DNA Intelligent 
Analysis (DIANA) group of "Alexander Fleming" B.S.R.cQ 
have developed a set of advanced Web applications to pro- 
vide access to computationally predicted miRNA targets. 
Since its original launch, DIANA Web app has been one of 
the most widely used service for miRNA analysis. It includes 
the following two core services. 

microT^] The service provides target prediction data for 
1884 miRNAs and more than six million predicted target 
genes, organized in a relational database. Besides the tar- 
get prediction experimental results, we provide miRNAs and 
genes functional analysis that goes beyond simple biological 
pathways, like, for example, relation of miRNAs to func- 
tional features, and diseases and medical descriptors. All 
retrieved miRNAs are associated to diseases, using textual 
information from PubMecf^j a well-known digital library for 
biomedical literature. 

mirGerQ The service provides information about tra- 
nscripts, and their transcription factors (TF) that corre- 
spond to miRNAs. A transcription factor is a protein that 
binds to specific DNA sequences, thereby controlling the 
flow of genetic information from DNA to mRNA. MirGen 
database stores information about 811 human genes, 1270 
human miRNAs, 386 mouse genes and 1012 mouse miRNAs, 
organized in a relational database. 

3. DATABASE OVERVIEW 

Next, we present an overview of the miRNA database 
maintained by our team in IMIS/aAlJ AthenaaAl R.C. and 
the DIANA group of "A. Fleming" B.S.R.C., storing info 
about computationally predicted miRNA targets produced 
by the target prediction algorithm proposed by DIANA group [6]. 

To better understand the miRNA domain and the DB 
schema design, we next clarify some issues. Since the term 
"miRNA" is nowadays used in a wide scope, it is common to 
distinguish between hairpin miRNAs and mature miRNAs, or 
just hairpins and matures from now on. The former signi- 
fies the genomic location of the latter. A hairpin is actually 
processed into several matures. Matures bind themselves to 
transcripts and prevent the creation of functional ribosomes 
(and, thus, prohibit protein construction). A transcript is a 
stretch of DNA transcribed into an RNA molecule (messen- 
ger RNA, ribosomal RNA, transfer RNA, etc). 

The miRNA database has some core tables to store the 
key entities of the miRNA domain (hairpins, matures, tran- 
scripts and protein-encoding genes) and model their rela- 
tionships (see Table[T]for a part of miRNA database schema) . 
There are also tables storing info about Kegg pathways;] 
and tissues. Kegg pathways is a collection of manually 
drawn pathway maps, with textual descriptions, represent- 
ing biologistsaAZ knowledge on molecular interaction and 
reaction networks. 



http:/ /www. fleming.gr/ 

9 http:/ /diana.cslab. ece.ntua.gr/DianaTools 

/ index. php?r=microtv4 
10 http: / /www. ncbi.nlm.nih.gov/pubmed/ 
1 1 http : / /diana. cslab. ece. ntua. gr / ?sec=databases 
12 http: / /www. genome.jp/kegg/pathway.html 





o 1 1 1 rn n EJescnption. 


Hairpins 


id (mima_id), name, sequence, 
species, gene location mfo , etc . 


Matures 


id (mimat), name, sequence, species. 


Tr an s c r i pt s 


tidj id given from cnscmbl.org 
^snstid.^ , species, D!Nj^\. strand, gene 
location info etc. 


ProtciriGcnGS 


id given from cnscmbl.org (snsgid.), 
name, description. 


Kcggs 


id given from gcnomc.jp (kegg_id), 


Tissues 


name, species. 


Join Tables 


Column Description 


MaturcHairpinConn 


It relates matures and hairpins. 


MicroT5Interactions 


It contains all the experimentally 
verified gene- mature interactions 
(bindings) . 


ProtcinGcncKcggCoiin 


It relates genes to kegg pathways. 


MatureTissucConn 


It relates matures to tissues. 



Table 1: Part of miRNA database schema. 



hairpins, deleted mature miRNAs are not stored in 
miRNA.dead file. 

Example entries of miRNA.dead are shown in Figure 
[l] For instance, the hairpin with ID hsa-mir-101-9 
and NAME MI0000104 has been deleted. The reason 
is that it was a duplicate entry (see the comment in 
CC field). There is a hairpin (MI0000739), though, 
that replaces the deleted one (see the FW field). 
miFam.dat It stores info about hairpin families at the 
time of a version. Hairpins that produce similar ma- 
ture miRNAs belong to the same family. It is main- 
tained incrementally. Example entries of miFam.dat 
are shown in Figure [I] For instance, hairpins with 
IDs MI0011482 (NAME bta-mir-677) and MI0004634 
(NAME mmu-mir-677) belong to the same family with 
is mir-677. 



4. CHANGE AND VERSION MANAGEMENT 

The miRBase database is a searchable database of pub- 
lished miRNA sequences and annotation. The miRBase 
database maintains info for 18443 hairpins and 49670 ma- 
tures. Each entry in miRBase represents a predicted hairpin 
miRNA with information on the location and sequence of 
the corresponding mature miRNA sequence. Hairpins, ma- 
ture miRNAs and their relationship between them change in 
time. miRBase maintains a list of files that record succes- 
sive versions along with the changes between them. A short 
description for each file follows. 

• miRNA.dat It maintains info related to all known 
hairpins (like ID, name, related matures, related pub- 
lications, sequence, etc.) at the time of each version. 
Every new version of miRNA.dat contributes to the 
previous one with all the newly discovered miRNAs, 
omitting the deleted ones. Example entries ofmiRNA.dat 
are shown in Figure [l] where info about the hairpin 
with name cel-let-7 and id (i.e., key) MI0000001 is 
presented. 

• miRNA. diff It tracks change operations on hairpins 
and matures. Each version of miRNA. diff refers to a 
certain time period and tracks changes only for that 
period. 

Example entries of miRNA. diff are shown in Figure [T] 
For instance, MI0000001 cel-let-7 NEW means that 
the hairpin with ID MI0000001 and name cel-let-7 is 
created. Also, MI0004476 mdv2-miR-M29-5p SEQUENCE 
NAME means that the hairpin with ID MI0004476 has 
changed its name (to mdv2-miR-M29-5p) and its se- 
quence. Note that to find the old name and the old 
sequence, we should refer to the older version of the 
miRNA.dat file, where hairpin names and info about 
sequences are available. Similarly, MIMAT0000115 dme- 
miR-10* SEQUENCE NAME means that the mature with 
ID MIMAT0000115 has changed its name (to dme- 
miR-10*) and its sequence. Note that IDs starting 
with "MIMA" refer to matures. 

• miRNA.dead It keeps all deleted hairpins at the time 
of a version. It is maintained incrementally. Deletion 
means either getting rid of a hairpin (e.g., incorrectly 
characterized in previous versions) or replacing a hair- 
pin with another one. For the latter case, links to 
existing hairpins are provided. Contrary to deleted 



hairpin name ■ 
hairpin ID ■ 



related publication - 



Entries in miRNA.dat 

- ID cel-let-7 standard; RNA; CEL; 99 BP. 
XX 

. AC MID000001; 
XX 

DE Caenorhabditis elegans let-7 stem-loop 
XX 

RN [1] 

RXPUBMED; 11679671. 

RA Lau NC, Lim LP. Weinstein EG. Bartel DP; 

RT "An abundant class of tiny RNAs with probable regulatory roles ir 

RT Caenorhabditis elegans"; 

RL Science. 294:858-862(2001). 

""xx 

DR RFAM; RF00027; let-7. 

DR WORMBASE; C05G5/12462-12364; . 



CC let-7 is found on chromosome X in 
CC the translafional repression of thes 
CC to late-larval and adult stages [2]. 
XX 

FH Key Location/Qualifiers 



related mature miRNA 



related mature miRNA - 



sequence info - 



FH 

FT miRNA 17 .38 
"FT /accession-"MIMAT0000001" 
FT /product- 'cel-let-7" 
FT /evidence-experimental 
FT /experiment-'cloned [1-3,5], .. 
FT [6]" 

"FT miRNA 61. .82 
FT /accession-"MIMAT0000028" 
FT /product-"cel-miR-56" 
FT /evidence-experimental 
FT /experiment-'cloned [1-3,5], .. 
FT [6]" 



AC MIPF0000811 
ID mir-677 

Ml MI0011482 bta-mir-677 
Ml MI0004634 mmu-mir-677 

Entries in miFam.dat 



AC MI0000104 

ID hsa-mir-101-9 

FW MI0000739 

CC Duplicate entry removed. 

Entries in miRNA.dead 



"xx 

SQ Sequence .. 
"uacacugugg... 
uaugcaauuu... 



MI0000001 cel-let-7 NEW 

MI0004476 mdv2-miR-M29-5p SEQUENCE NAME 
MIMAT0000115 dme-miR-10' SEQUENCE NAME 

Entries in miRNA. diff 



Figure 1: File examples of tracking miRNA changes. 

We have examined all files and recorded the following 
types of changes for hairpins: (a) NEW: a new hairpin is 
created, (b) NAME: a hairpin changes its name, (c) SE- 
QUENCE (SEQ): a hairpin changes its sequence, (d) NAME- 
/SEQUENCE (NS): a hairpin changes both its name and 
sequence at the same time, (e) FORWARD (FW): a hairpin 
is deleted, but miRBase give a link to another hairpin for 
replacement, and (f) DELETE (DEL): a hairpin is deleted 
(no replacement). 

Similarly, we have identified the following type of changes 
for matures: (a) NEW: a new mature is created, (b) NAME: 
a mature changes its name, (c) SEQUENCE (SEQ): a ma- 
ture changes its sequence, (d) NAME/SEQUENCE (NS): 
a mature changes both its name and sequence at the same 
time, (e) ADD PARENT HAIRPIN (APH): a new hairpin 
is added to the list of hairpins that produces a mature, (f) 
REMOVE PARENT HAIRPIN (RPH): a hairpin is removed 
from the list of hairpins that produces a mature, and (g) 
DELETE (DEL): a mature is deleted. 



To manage change and version info, we maintain two his- 
tory tables: HairpinsHistory and MaturesHistory. Tables 
[2] and [3] show how change and version info is maintained 
in history tables. For each hairpin change, HairpinsHistory 
keeps a record with, among others, the hairpin id, the type 
of change, the version number where the change occurred, 
and the version number where the next change occurs. The 
hairpin with id ..1364 is first created in version 13. In version 
16, it changes name from dre-mir-10b to dre-mir-10b-l. 
No other change has occurred till version 18, where a change 
in its sequence has occurred. Another sequence change has 
occurred in version 20. Similarly, the mature with id ..9477 
is first created in version 28, getting the name bf l-miR-79, 
and having the parent hairpin ..021. In version 30, it changes 
name (to bfl-miR-9-3p) and sequence. 



mima- 


change 


name 


seq 


first_app 


■ last_app- 


id 








earance 


earance 


..1364 


NEW 


dre-mir-lOb 


..X.. 


13 


15 


..1364 


NAME 


dre-mir-10b-l 


..X.. 


16 


17 


..1364 


SEQ 


drc-mir-10b-l 


..Y.. 


18 


19 


..1364 


SEQ 


dre-mir-10b-l 


..Z.. 


20 


32 



Table 2: Table HairpinsHistory: record samples. 



mimat 


change 


name 


seq 


par. 
hair- 
pin 


first_app 
earance 


■ last_app- 
earance 


..9477 


NEW 


bfi-miR-79 


.X. 




28 




..9477 


APH 


bfi-miR-79 


.X. 


..021 


28 


29 


..9477 


NS 


bfl-miR-9-3p 


.Y. 




30 


32 



Table 3: Table MaturesHistory: record samples. 



5. PUBLISHING LINKED OPEN MIRNA DATA 
5.1 LOD technology adopted 

To publish miRNA and miRBase databases as LOD, we 
adopted the "virtual RDF" approach: accessing a non-RDF 
database using an RDF view. Such an approach enables 
the access of non-RDF, legacy databases without having to 
replicate the whole database into RDF. The D2R server [I] 
is a popular tool that follows the "virtual RDF" approach 
for publishing the content of relational databases on the Se- 
mantic Web. Database content is mapped to RDF using 
the D2RQ declarative mapping language that captures map- 
pings between database schemas and RDFS/OWL schemas. 

A D2RQ mapping specifies how RDF resources are iden- 
tified and how RDF property values are generated from 
database content. Mappings in D2RQ are declared based 
on ClassMaps and PropertyBridges. A ClassMap maps a 
set of database records to an RDF class of resources. Re- 
sources are assigned URIs using URI patterns. The pattern 
hairpins/@@diana_hairpins .mima_id@@, for instance, pro- 
duces a relative URI like hairpins/MI0000005 by insert- 
ing a value from the column mima_id of table hairpins of 
miRNA database into the pattern. The D2R Server turns 
relative URIs into absolute URIs by expanding them with 
the serveraAZs base URI. If a database already contains 
URIs for identifying database content, then these external 
URIs can be used instead of pattern-generated URIs. The 
following ClassMap definition creates the class of hairpin 
resources, and assigns them URIs using their ids from the 
miRNA database: 



map:Hairpins a d2rq:ClassMap; 

d2rq: datastorage map : database ; 

d2rq: uriPattern "hairpins/@@diana_hairpins .mima_id©@" ; 

d2rq: class diana: Hairpin; 

d2rq: classDef initionLabel "Hairpin" ; 

Each ClassMap has a set of PropertyBridges which specify 
how the properties of an RDF instance are created. Prop- 
erty values can be literals, URIs or blank nodes, and can be 
created directly from database values or by employing pat- 
terns. The following PropertyBridge definition creates the 
property diana: name. Values for that property are created 
from the name column of table diana_haipins: 

map : diana_hairpins_name a d2rq: PropertyBridge ; 
d2rq : belongsToClassMap map : Hairpins ; 
d2rq: property diana: name; 

d2rq:propertyDef initionLabel "Hairpins name"; 
d2rq: column "diana_hairpins .name" ; 

Note that D2R provides flexible mappings of complex re- 
lational structures, allowing SQL statements directly in the 
mapping rules. The resulting record sets are grouped after- 
wards and the data is mapped to the created instances. 

We used D2R as a full-fledge Linked Open Data server. 
The size of the LOD base is around lOOMillion triples. 

5.2 LOD publishing 

The miRNA LOD schema has been designed around four 
core classes: Hairpin, Mature, ProteinGene and Transcript 
(defined as ClassMap entities in D2R - see previous subsec- 
tion). Figure[2]shows an overview of the schema adopted and 
part of the mappings used. Consider, for example, the class 
Mature. Resources of that class are assigned URIs of the 
formhttp://. . . /resource/matures/ [Matures .mimat] , where 
Matures .mimat gets values from column mimat of Table Ma- 
tures. Some of the class properties are: name, species, re- 
latedKegg, targetsProteinGene (defined as PropertyBridge 
entities in D2R - see previous subsection). 

Consider also the property targetsProteinGene that re- 
lates matures with genes (targets). Note that ProteinGene 
resources are assigned URIs of the form http://. . ,/resou- 
rce/proteingenes/ [ProteinGenes . ensgid] , where Prote- 
inGenes . ensgid gets values from column ensgid of Table 
ProteinGenes. For a given Mature URI, to calculate the 
URIs of related ProteinGene resources, the mapping defini- 
tion should include the following join: 
Matures .mimat=MicroT5Interactions .mimat AND 
MicroT5Interactions . tid=Transcripts . tid AND 
Transcripts . enstid=ProteinGenes . enstid. 

To link our LOD to the LOD cloud, we provide owl : sameAs 
links to appropriate biological LOD infrastructures. See, for 
example, the BI02RDrf^data source that provide RDF de- 
scriptions for transcripts, tissues, keggs, and species. 

5.2.1 Change and version management 

One of the major research problems in LOD publishing 
is how to deal with linked data that changes over time. 
While handling changes for information resources is rather 
straightforward, handling changes for non-information re- 
sources is a challenging issue. Key requirements for dealing 
with changes in miRNA LOD are the following: 

• Biologists that care only about the current state of 
data should be able to browse or query the miRNA 

l3 http://bio2rdf.org 




Figure 2: Publishing miRNA data as LOD data: RDFS to database mappings (up-to-date data). 



= [HairpinsHistory.name] 

where { [Ha i rpinsHistory.fi rstApp] > [Versions. curVer] 

and [Hairpins History. lastApp] < [Versions. curVer] } 



[HairpinsHistory.name] ^-..name sequence p [HairpinsHistory.seq] 



— > OWL Property 



> Evolving Property 

( ) OWL Class 

[table.x] values from the Column x 
( abc ) Use the Values from Tables or Columns 



nextVersion 



http:// .../resource/hairpins/[Hairpins.accession] / [Versions. curVer] 
http:// .../resource/hairpins/[Hairpins.accession] / [Versions. prevVer] 



prevVersion 



(Hairpin A t 
(version n) J 



(Hairpin A 
^versiorw^^^* 

ie / 4 \seque 



prevVersion 



4* / 4 
[HairpinHistory.name] ■ [HairpinHistory.seq] 



http:// .../resource/hairpins/[Hairpins. accession] / [Versions. nextVer] 

nextVersion 



(Hairpin A 
(version n+1} y . 



name/ '..sequence 



[HairpinsHistory.name] [HairpinsHistory.seq] \ 



producesMature; nextVers ion 

(MaturesHistory.parHairp) [ ^^^0* http:// .../rcsourcc/maturcs/[Matures.mimat] ,' [Versions. curVer] 

http:// .../resource/matures/[Matures.mimat] / [Versions. prevVer] 



(Mature A 
^^versiorwT^^^ ' 



prevVersion .'producesMature 

— < (MaturesHistory.parHairp) 



http:// .../resource/matures/[Matures.mimat] / [Versions. nextVer] 



(Mature A 
^versiot^^^^^ 



'.sequence 



prevVersion '•, nextVersion 

[MaturesHi story, name] [ Matures History, seq] 



(Mature A 
(version n+1] J 



[MaturesHistory.name] [MaturesHistory.seq] 



= [MaturesHistory.seq] 

where { [MaturcsHistorv.fi rstApp] > [Versions. curVer] 

and [MaturesHistory.lastApp] < [Versions. curVer] } 



[MaturesHistory.name] [MaturesHistory.seq] 



Figure 3: Publishing miRNA data as LOD data: RDFS to database mappings (historic data). 



LOD base easily to get up-to-date data. Also, up-to- 
date data should be easily retrieved using SPARQL. 
• Biologists should be able to query historic miRNA 
data, and navigate through versions. Also, miRNA 
changes should be treated as first-class citizens so that 
one can form SPARQL queries that involve change re- 
sources, and trace those changes and their effects. 
Browsing and querying up-to-date miRNA data. Us- 
ing the D2R browsing facilities, biologists can navigate through 
the miRNA LOD base, exploring hairpin, mature, gene or 
transcript resources and their descriptions. All data pro- 
vided refer to the current version of miRNA database. Also, 
any resource URI refers to the current version of that re- 
source. This is ensured because all triples involving re- 
sources from Hairpin, Mature, ProteinGene and Transcript 



classes are populated from the core and join tables of Table 
[I] that are up-to-dated. Using the D2R SPARQL end-point 
facilities, biologists can pose SPARQL queries to the miRNA 
LOD. Whenever a resource URI is used in a query, it refers 
to the current version of that resource. To get up-to-date re- 
sults, a property should be used to avoid the retrieval of out- 
of-date triples. For example, the following SPARQL query 
retrieves 10 hairpins, and their sequences, that are located 
in chromosome X from the current version of miRNA LOD: 

SELECT ?h ?s WHERE { 

?h rdf:type diana: Hairpin. 

?h diana: sequence ?s. 

?h diana: chromosome "X". 

?h diana: label "now". } LIMIT 10 

Browsing and querying historic miRNA data. Out-of- 
date resource descriptions are retrieved using the following 
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Figure 4: Resource description of mature MIMAT0010008 at version 16.0 



pattern for URIs: URI/-(version number}. For example, the 
URI http://. . ./resource/hairpins/MI0000044/8.0 gets 
the RDF description of hairpin MI0000044 in version 8.0 of 
miRBase. To pose the previous SPARQL query on that ver- 
sion of miRBase, one should replace ?h diana: label "now" . 
with ?h diana: version "8.0". Note that we provide prop- 
erties (diana:nextVersion, diana :prevVersion) to move to 
the next and the previous version of a resource description. 

To be able to provide the property values and URIs which 
are valid at a certain version, we exploit the version infor- 
mation present in the history tables HairpinsHistory and 
MaturesHistory (see Tables [5] and [3| . Figure [3] shows an 
overview of the schema adopted and part of the mappings 
used to manage changes and versions. For example, given 
a current version curVer, to retrieve the valid value for the 
name property of a hairpin, we should define a conditional 
mapping to focus the retrieval on values that remain un- 
changed for a time period that starts before curVer and 
ends after curVer (similarly for, e.g., mature names). 

Each hairpin or mature resource description includes prop- 
erties that capture the changes which those resources are af- 
fected by. For each change, we track its effect and its cause. 
Figure [f] shows the description of mature MIMAT0010008 
at version 16. The following SPARQL query retrieves 10 
hairpins that where deleted or replaced in version in version 
1.3. of miRBase, and the URIs of the change operations: 

SELECT ?h ?d ?c WHERE { 
?h rdf: type diana: Hairpin. 

{{?h diana : changeDelete ?d.} UNION {?h diana: changeForward ?c.}} 
?h diana:version "1.3". } LIMIT 10 

6. CONCLUSION AND FURTHER WORK 

In this work we presented a case study of publishing ge- 
nomic and experimental data related to microRNA biomo- 
lecules as Linked Open Data. Legacy data from two well- 
known microRNA databases with experimental data and ob- 
servations, as well as change and version information about 
microRNA entities, are fused and exported as LOD. The 
miRNA LOD server assists biologists to explore biological 
entities, and navigate between previous/next versions of the 
resources, and also provides a SPARQL endpoint for applica- 
tions to query historical miRNA data and track changes. As 
future work, we plan to expand the LOD set with resources 
available from gene databanks, and also to implement ma- 
terialized approaches (i.e., using a native RDF store). 
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